Using differentially expressed genes from pre/post exercise human muscle, SignatureSearch was used to find drugs of similar perturbation signatures as possible candidates for Cortes R01 prelim data analysis for “exercise mimetics” for Alzheimers Disease.
Dataset: GSE108643 (RNAseq counts normalized and DEG analysis in R21_Prelim_210304 markdown)
GitHub: https://github.com/lizzyjoan/PreliminaryR21
System which operations were done on: MacBook Pro (16-inch, 2019), Processor 2.4 GHz 8-Core Intel Core i9, Memory 64 GB 2667 MHz DDR4, Graphics AMD Radeon Pro 5300M 4 GB Intel UHD Graphics 630 1536 MB
#reference database
eh <- ExperimentHub()
## snapshotDate(): 2020-10-27
cmap <- eh[["EH3223"]]; cmap_expr <- eh[["EH3224"]]
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
#DESeq2 results
human_deseq2_res <- read.csv("210518_Human_DESeq_Res.csv", header = TRUE, row.names = 1)
#LFC non NA genes
human_LFC <- human_deseq2_res[!is.na(human_deseq2_res$log2FoldChange),]
#Get rid of specific transcript .X numbers
rownames(human_LFC) <- sub("\\..*", "", rownames(human_LFC))
Find up and down genes
#showing up empty for LFC > 2 and p-adj < 0.05, so changed LFC to 1.5
human_up_df <- human_LFC[human_LFC$log2FoldChange >1 & human_LFC$padj <0.05,]
human_up_list <- rownames(human_up_df)
# Jen used mapIds but it sounds like it could be the same as select(), but mapIds gives error: Error in .processFilterParam(keys, x) : 'filter' has to be an 'AnnotationFilter', a list of 'AnnotationFilter' object, an 'AnnotationFilterList' or a valid filter expression!
human_up_list <- mapIds(EnsDb.Hsapiens.v75, keys = human_up_list, column="ENTREZID", keytype="GENEID", multiVals="first")
## Warning: Unable to map 138 of 252 requested IDs.
#correlation might be better method because the LFC?
human_down_df <- human_LFC[human_LFC$log2FoldChange < -1 & human_LFC$padj <0.05,]
human_down_list <- rownames(human_down_df)
human_down_list <- mapIds(EnsDb.Hsapiens.v75, keys= human_down_list , column="ENTREZID", keytype="GENEID", multiVals="first")
## Warning: Unable to map 42 of 54 requested IDs.
Lamb et al. (2006) introduced the gene expression-based search method known as Connectivity Map (CMap) where a GES database is searched with a query GES for similar entries (Lamb et al. 2006). Specifically, the GESS method from Lamb et al. (2006), here termed as CMAP, uses as query the two label sets of the most up- and down-regulated genes from a genome-wide expression experiment, while the reference database is composed of rank transformed expression profiles (e.g. ranks of LFC or z-scores). The actual GESS algorithm is based on a vectorized rank difference calculation. The resulting Connectivity Score expresses to what degree the query up/down gene sets are enriched on the top and bottom of the database entries, respectively. The search results are a list of perturbagens such as drugs that induce similar or opposing GESs as the query. Similar GESs suggest similar physiological effects of the corresponding perturbagens.
Function qSig() builds an object to store the query signature, reference database and GESS method used for GESS methods
qsig_cmap_human <- qSig(query = list(upset=as.character(human_up_list), downset=as.character(human_down_list)),
gess_method="CMAP", refdb= "lincs")
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
## 59 / 252 genes in up set share identifiers with reference database
## 6 / 54 genes in down set share identifiers with reference database
cmap <- gess_cmap(qSig= qsig_cmap_human, chunk_size=5000)
cmap_res <- result(cmap)[1:50,]
write.csv2(cmap_res, file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/output/SignatureSearch/210607_cmap_res_human_revDESeq2contrasts.csv")
cmap_res
## # A tibble: 50 × 10
## pert PCID cell type trend raw_score scaled_score N_upset N_downset
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <int> <int>
## 1 aminopentamide 22565 PC3 trt_… down -0.979 -1 59 6
## 2 BRD-K00203625 7029… A549 trt_… up 0.976 1 59 6
## 3 BRD-K65455015 4449… VCAP trt_… down -0.977 -0.999 59 6
## 4 cefmetazole 2366… MCF7 trt_… up 0.961 0.985 59 6
## 5 BRD-K86336638 5465… PHH trt_… down -0.945 -0.965 59 6
## 6 vitexin 6713… HA1E trt_… down -0.927 -0.948 59 6
## 7 BRD-K32301393 5465… MCF7 trt_… up 0.913 0.936 59 6
## 8 noreleagnine 1078… MCF7 trt_… down -0.916 -0.936 59 6
## 9 BRD-K98626176 5463… PC3 trt_… down -0.914 -0.934 59 6
## 10 tiaprofenic-acid 5468 PC3 trt_… down -0.907 -0.927 59 6
## # … with 40 more rows, and 1 more variable: t_gn_sym <chr>
This table contains the search results for each perturbagen (here drugs) in the reference database ranked by their signature similarity to the query. For the CMAP method, the similarity metrics are raw_score and scaled_score. The raw score represents the bi-directional enrichment score (Kolmogorov-Smirnov statistic) for a given up/down query signature. Under the scaled_score column, the raw_score has been scaled to values from 1 to -1 by dividing positive scores and negative scores with the maximum positive score and the absolute value of the minimum negative score, respectively. The remaining columns in the search result table contain the following information. pert: name of perturbagen (e.g. drug) in the reference database; cell: acronym of cell type; type: perturbation type, e.g. compound treatment is trt_cp; trend: up or down when reference signature is positively or negatively connected with the query signature, respectively; N_upset or N_downset: number of genes in the query up or down sets, respectively; t_gn_sym: gene symbols of the corresponding drug targets.
Visualize GESS Results
#drugs_top15 <- unique(cmap_res$pert)[1:15]
drugs_top30 <- unique(result(cmap)$pert)[1:30]
gess_res_vis(cmap_res, drugs = drugs_top30, col = "scaled_score")
CMAP TSEA
The following introduces how to perform TSEA (target set enrichment analysis) on drug-based GESS results using as functional annotation systems GO, KEGG and Reactome pathways. … The specialized enrichment algorithms include Duplication Adjusted Hypergeometric Test (dup_hyperG), Modified Gene Set Enrichment Analysis (mGSEA) and MeanAbs (mabs).
Internally, the latter converts the drug set to a target set, and then computes for it enrichment scores for each MF GO term based on the hypergeometric distribution. The enrichment results are stored in a feaResult object. It contains the organism information of the annotation system, and the ontology type of the GO annotation system. If the annotation system is KEGG, the latter will be “KEGG”. The object also stores the input drugs used for the enrichment test, as well as their target information.
tsea_kegg_cmap_human <- tsea_dup_hyperG(drugs = drugs_top30[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 16 drugs:
## aminopentamide / brd-k00203625 / brd-k65455015 / cefmetazole / brd-k86336638 / vitexin / brd-k32301393 / brd-k98626176 / tiaprofenic-acid / brd-k93410934 / brd-a39121618 / xmd-1185h / brd-a54194844 / oxotremorine-sesquifumarate / evoxine / brd-k88373420
## Loading required package: org.Hs.eg.db
##
## Reading KEGG annotation online:
##
## Reading KEGG annotation online:
result(tsea_kegg_cmap_human)
## # A tibble: 112 × 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa04… Renin secr… 8/24 69/8086 1.22e-11 9.31e-10 3.12e-10 775/77… 8
## 2 hsa04… cGMP-PKG s… 10/24 167/80… 1.65e-11 9.31e-10 3.12e-10 151/15… 10
## 3 hsa04… Adrenergic… 9/24 150/80… 2.11e-10 6.04e- 9 2.03e- 9 775/77… 9
## 4 hsa04… Aldosteron… 8/24 98/8086 2.19e-10 6.04e- 9 2.03e- 9 775/77… 8
## 5 hsa04… Oxytocin s… 9/24 154/80… 2.67e-10 6.04e- 9 2.03e- 9 775/77… 9
## 6 hsa05… Amphetamin… 7/24 69/8086 7.45e-10 1.40e- 8 4.70e- 9 4128/4… 7
## 7 hsa04… Vascular s… 8/24 133/80… 2.56e- 9 4.13e- 8 1.38e- 8 775/77… 8
## 8 hsa05… Hypertroph… 7/24 90/8086 4.96e- 9 7.00e- 8 2.35e- 8 775/77… 7
## 9 hsa04… GnRH signa… 7/24 93/8086 6.25e- 9 7.85e- 8 2.63e- 8 775/77… 7
## 10 hsa04… Serotonerg… 7/24 115/80… 2.78e- 8 3.14e- 7 1.05e- 7 4128/4… 7
## # … with 102 more rows
cmap_res_down <- cmap_res[cmap_res$trend =="down",]
cmap_res_up <- cmap_res[cmap_res$trend =="up",]
down_drug_cmap <- unique(cmap_res_down$pert)[1:20]
up_drug_cmap <- unique(cmap_res_up$pert)[1:20]
gess_res_vis(cmap_res, drugs = up_drug_cmap, col = "scaled_score")
TSEA Network Visualization
dtnetplot(drugs = drugs(tsea_kegg_cmap_human), set = "hsa00140",
desc="Steroid hormone biosynthesis")
## No targets found in DrugBank/CLUE/STITCH database for 16 drugs:
## aminopentamide / brd-k00203625 / brd-k65455015 / cefmetazole / brd-k86336638 / vitexin / brd-k32301393 / brd-k98626176 / tiaprofenic-acid / brd-k93410934 / brd-a39121618 / xmd-1185h / brd-a54194844 / oxotremorine-sesquifumarate / evoxine / brd-k88373420
#u).
CMAP DSEA
Instead of translating ranked lists of drugs into target sets, as for TSEA, the functional annotation categories of the targets can be assigned to the drugs directly to perform Drug Set Enrichment Analysis (DSEA) instead. Since the drug lists from GESS results are usually unique, this strategy overcomes the duplication problem of the TSEA approach. This way classical enrichment methods, such as GSEA or tests based on the hypergeometric distribution, can be readily applied without major modifications to the underlying statistical methods. As explained above, TSEA and DSEA performed with the same enrichment statistics are not expected to generate identical results. Rather they often complement each other’s strengths and weaknesses.
dsea_res_cmap <- dsea_hyperG(drugs = drugs_top30[1:20], type = "KEGG",
pvalueCutoff = 0.05, qvalueCutoff = 0.05,
minGSSize = 10, maxGSSize = 2000)
result(dsea_res_cmap)
## # A tibble: 0 × 9
## # … with 9 variables: ID <chr>, Description <chr>, GeneRatio <chr>,
## # BgRatio <chr>, pvalue <dbl>, p.adjust <dbl>, qvalue <dbl>, itemID <chr>,
## # Count <int>
write.csv2(result(dsea_res_cmap), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs//210526_dsea_cmap_res_human.csv")
#five top pathways KEGG upregulated or downregulated, KEGG usually has maps that relate to pathways, but some geens are linked to multiple pathways
#Are there genes that are linked to linked to multiple KEGG pathways?
DSEA Network Visualization
dtnetplot(drugs = drugs(dsea_res_cmap), set = "hsa00140",
desc="Steroid hormone biosynthesis")
## No targets found in DrugBank/CLUE/STITCH database for 16 drugs:
## aminopentamide / brd-k00203625 / brd-k65455015 / cefmetazole / brd-k86336638 / vitexin / brd-k32301393 / brd-k98626176 / tiaprofenic-acid / brd-k93410934 / brd-a39121618 / xmd-1185h / brd-a54194844 / oxotremorine-sesquifumarate / evoxine / brd-k88373420
Subramanian et al. (2017) introduced a more complex GESS algorithm, here referred to as LINCS. While related to CMAP, there are several important differences among the two approaches. First, LINCS weights the query genes based on the corresponding differential expression scores of the GEPs in the reference database (e.g. LFC or z-scores). Thus, the reference database used by LINCS needs to store the actual score values rather than their ranks. Another relevant difference is that the LINCS algorithm uses a bi-directional weighted Kolmogorov-Smirnov enrichment statistic (ES) as similarity metric.
qsig_lincs_human <- qSig(query =list(upset=as.character(human_up_list), downset=as.character(human_down_list)),
gess_method="LINCS", refdb="lincs")
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
## 59 / 252 genes in up set share identifiers with reference database
## 6 / 54 genes in down set share identifiers with reference database
The similarity scores stored in the LINCS result table are summarized here. WTCS: Weighted Connectivity Score; WTCS_Pval: nominal p-value of WTCS; WTCS_FDR: false discovery rate of WTCS_Pval; NCS: normalized connectivity score; NCSct: NCS summarized across cell types; Tau: enrichment score standardized for a given database. The latter is only included in the result table if tau=TRUE in a gess_lincs function call. The example given is run with tau=FALSE, because the tau values are only meaningful when the complete LINCS database is used which is not the case for the toy database.
The following provides a more detailed description of the similarity scores computed by the LINCS method. Additional details are available in the Supplementary Material Section of the Subramanian et al. (2017) paper.
WTCS: The Weighted Connectivity Score is a bi-directional ES for an up/down query set. If the ES values of an up set and a down set are of different signs, then WTCS is (ESup-ESdown)/2, otherwise, it is 0. WTCS values range from -1 to 1. They are positive or negative for signatures that are positively or inversely related, respectively, and close to zero for signatures that are unrelated.
WTCS_Pval and WTCS_FDR: The nominal p-value of the WTCS and the corresponding false discovery rate (FDR) are computed by comparing the WTCS against a null distribution of WTCS values obtained from a large number of random queries (e.g. 1000).
NCS: To make connectivity scores comparable across cell types and perturbation types, the scores are normalized. Given a vector of WTCS values w resulting from a query, the values are normalized within each cell line c and perturbagen type t to obtain the Normalized Connectivity Score (NCS) by dividing the WTCS value by the signed mean of the WTCS values within the subset of signatures in the reference database corresponding to c and t.
NCSct: The NCS is summarized across cell types as follows. Given a vector of NCS values for perturbagen p, relative to query q, across all cell lines c in which p was profiled, a cell-summarized connectivity score is obtained using a maximum quantile statistic. It compares the 67 and 33 quantiles of NCSp,c and retains whichever is of higher absolute magnitude.
Tau: The standardized score Tau compares an observed NCS to a large set of NCS values that have been pre-computed for a specific reference database. The query results are scored with Tau as a standardized measure ranging from 100 to -100. A Tau of 90 indicates that only 10% of reference perturbations exhibit stronger connectivity to the query. This way one can make more meaningful comparisons across query results.
lincs_human <- gess_lincs(qsig_lincs_human, sortby="NCS", tau=TRUE)
result(lincs_human)[1:50,]
## # A tibble: 50 × 15
## pert PCID cell type trend WTCS WTCS_Pval WTCS_FDR NCS Tau
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 BRD-K863366… 546541… PHH trt_… down -0.617 0.0000167 9.53e-5 -1.77 -99.9
## 2 BRD-K904461… 445011… PC3 trt_… up 0.641 0.0000156 9.53e-5 1.75 99.4
## 3 PD-173074 1401 PHH trt_… up 0.619 0.0000166 9.53e-5 1.73 99.3
## 4 garcinol 737075… NPC trt_… down -0.609 0.0000170 9.53e-5 -1.73 -99.5
## 5 GW-3965 447905 SKB trt_… up 0.636 0.0000158 9.53e-5 1.73 99.4
## 6 BRD-A391216… 2793096 MCF7 trt_… down -0.627 0.0000162 9.53e-5 -1.71 -99.6
## 7 ASN-05257430 650361 A549 trt_… down -0.608 0.0000171 9.53e-5 -1.71 -99.9
## 8 thiethylper… 5440 MCF7 trt_… down -0.616 0.0000167 9.53e-5 -1.68 -100.
## 9 geranylgera… 5281365 MCF7 trt_… down -0.616 0.0000167 9.53e-5 -1.68 -99.7
## 10 BRD-A741349… 3950572 HA1E trt_… down -0.632 0.0000160 9.53e-5 -1.68 -99.8
## # … with 40 more rows, and 5 more variables: TauRefSize <int>, NCSct <dbl>,
## # N_upset <int>, N_downset <int>, t_gn_sym <chr>
lincs_res <- result(lincs_human)
write.csv(lincs_res, file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210526_lincs_human_res.csv")
drugs_lincs_top <- unique(lincs_res$pert)[1:20]
gess_res_vis(lincs_res, drugs = drugs_lincs_top, col = "NCS")
Top “Down” Trends
lincs_res_down <- lincs_res[lincs_res$trend == "down",]
down_drug_lincs <- unique(lincs_res_down$pert)[1:20]
lincs_res_up <- lincs_res[lincs_res$trend == "up",]
up_drug_lincs <- unique(lincs_res_up$pert)[1:20]
drugs_lincs_topdown <- unique(lincs_res_down$pert)[1:20]
gess_res_vis(lincs_res, drugs = drugs_lincs_topdown , col = "NCS")
TSEA
tsea_lincs_down <- tsea_dup_hyperG(drugs = drugs_lincs_topdown[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 12 drugs:
## brd-k86336638 / garcinol / brd-a39121618 / asn-05257430 / geranylgeraniol / brd-a74134989 / brd-k31302860 / rhapontin / brd-k93410934 / brd-k36591038 / brd-a02241946 / sinensetin
result(tsea_lincs_down)
## # A tibble: 50 × 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa05… Nicotine a… 22/53 40/8086 6.08e-40 5.65e-38 4.42e-38 2555/2… 22
## 2 hsa04… Neuroactiv… 26/53 341/80… 2.41e-22 1.12e-20 8.74e-21 1812/1… 26
## 3 hsa05… Morphine a… 16/53 91/8086 1.78e-19 5.50e-18 4.30e-18 1812/2… 16
## 4 hsa04… GABAergic … 15/53 89/8086 5.51e-18 1.28e-16 1.00e-16 2555/2… 15
## 5 hsa04… Retrograde… 16/53 148/80… 5.80e-16 1.08e-14 8.43e-15 2555/2… 16
## 6 hsa04… Taste tran… 10/53 86/8086 1.46e-10 2.26e- 9 1.76e- 9 2555/2… 10
## 7 hsa05… Cocaine ad… 8/53 49/8086 7.22e-10 9.59e- 9 7.49e- 9 1812/2… 8
## 8 hsa05… Amphetamin… 8/53 69/8086 1.21e- 8 1.41e- 7 1.10e- 7 1812/2… 8
## 9 hsa04… Glutamater… 8/53 114/80… 6.37e- 7 6.58e- 6 5.14e- 6 2902/2… 8
## 10 hsa04… cAMP signa… 10/53 219/80… 1.22e- 6 1.13e- 5 8.84e- 6 1812/2… 10
## # … with 40 more rows
write.csv2(result(tsea_lincs_down), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210526_lincs_tsea.csv")
DSEA
dsea_res_lincs <- dsea_hyperG(drugs = drugs_lincs_topdown, type = "KEGG",
pvalueCutoff = 0.05 , qvalueCutoff = 0.05,
minGSSize = 2, maxGSSize = 2000)
result(dsea_res_lincs)
## # A tibble: 0 × 9
## # … with 9 variables: ID <chr>, Description <chr>, GeneRatio <chr>,
## # BgRatio <chr>, pvalue <dbl>, p.adjust <dbl>, qvalue <dbl>, itemID <chr>,
## # Count <int>
#0 rows
Top “Up” Trends
drugs_lincs_topup <- unique(lincs_res_up$pert)[1:20]
gess_res_vis(lincs_res, drugs = drugs_lincs_topup , col = "NCS")
TSEA “Up” Trends
tsea_lincs_up <- tsea_dup_hyperG(drugs = drugs_lincs_topup[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 13 drugs:
## brd-k90446118 / badge / salicin / brd-a93015352 / brd-k91047982 / brd-k33928422 / brd-k31412180 / dibutyrylcyclic-gmp / brd-k79779816 / brd-k42687792 / hydrocotarnine / tetroquinone / brd-k08198779
result(tsea_lincs_up)
## # A tibble: 122 × 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa04020 Calcium sig… 12/46 240/80… 5.65e-9 9.50e-7 7.14e-7 2263/… 12
## 2 hsa04015 Rap1 signal… 10/46 210/80… 2.03e-7 1.71e-5 1.28e-5 2263/… 10
## 3 hsa04151 PI3K-Akt si… 11/46 354/80… 3.25e-6 1.38e-4 1.04e-4 2263/… 11
## 4 hsa04010 MAPK signal… 10/46 294/80… 4.42e-6 1.38e-4 1.04e-4 2263/… 10
## 5 hsa04014 Ras signali… 9/46 232/80… 4.92e-6 1.38e-4 1.04e-4 2263/… 9
## 6 hsa01521 EGFR tyrosi… 6/46 79/8086 4.93e-6 1.38e-4 1.04e-4 2263/… 6
## 7 hsa05208 Chemical ca… 8/46 223/80… 3.12e-5 7.48e-4 5.63e-4 196/4… 8
## 8 hsa05230 Central car… 5/46 70/8086 4.38e-5 8.76e-4 6.58e-4 2263/… 5
## 9 hsa04520 Adherens ju… 5/46 71/8086 4.69e-5 8.76e-4 6.58e-4 60/14… 5
## 10 hsa04510 Focal adhes… 7/46 201/80… 1.24e-4 2.09e-3 1.57e-3 2321/… 7
## # … with 112 more rows
write.csv2(result(tsea_lincs_up), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210526_lincs_up_tsea.csv" )
DSEA “Up” Trends
dsea_lincs_up <- dsea_hyperG(drugs = drugs_lincs_topup, type = "KEGG",
pvalueCutoff = 0.05 , qvalueCutoff = 0.05,
minGSSize = 2, maxGSSize = 2000)
result(dsea_lincs_up)
## # A tibble: 0 × 9
## # … with 9 variables: ID <chr>, Description <chr>, GeneRatio <chr>,
## # BgRatio <chr>, pvalue <dbl>, p.adjust <dbl>, qvalue <dbl>, itemID <chr>,
## # Count <int>
The Bioconductor gCMAP (Sandmann et al. 2014) package provides access to a related but not identical implementation of the original CMAP algorithm proposed by Lamb et al. (2006). It uses as query a rank transformed GEP and the reference database is composed of the labels of up and down regulated DEG sets. This is the opposite situation of the CMAP method, where the query is composed of the labels of up and down regulated DEGs and the database contains rank transformed GESs.
First, create matrix with zscores
#changed LFC > 2 to 1.5, too few genes otherwise
DESEQ2_sig_df <- human_LFC[abs(as.numeric(human_LFC$log2FoldChange)) >1.5 & human_LFC$padj <0.05,]
LOG <- as.numeric(DESEQ2_sig_df$log2FoldChange)
IDS <- mapIds(EnsDb.Hsapiens.v75, keys= rownames(DESEQ2_sig_df) , column="ENTREZID", keytype="GENEID", multiVals="first")
## Warning: Unable to map 40 of 126 requested IDs.
names(IDS) <- NULL
LOG_V2 <- LOG[!is.na(IDS)]
IDS_V2 <-IDS[!is.na(IDS)]
duplicated_genes <- unique(IDS_V2[duplicated(IDS_V2)])
names(LOG_V2) <- IDS_V2
#function Jen had used, did not need for my data, though
#fix<- c()
#for (i in 1:length(duplicated_genes)){
# group <- LOG_V2[names(LOG_V2) == duplicated_genes[i]]
# top <- group[group == max(abs(group))]
# fix <- c(fix, top)
#}
#fix
#continued data wrangling Jen used that I did not need to, skipped right to the last part (I did not have duplicates)
#LOG_V3 <- LOG_V2[!names(LOG_V2) %in% duplicated_genes]
#LOG_V4 <- c(LOG_V3, fix)
#LOG_V5 <- as.data.frame(LOG_V4)
#rownames(LOG_V5) <- names(LOG_V4)
#duplicated_genes is zero
LOG_V5 <- as.data.frame(LOG_V2)
rownames(LOG_V5) <- names(LOG_V2)
qsig_gcmap <- qSig(query = as.matrix(LOG_V5), gess_method = "gCMAP", refdb = "lincs")
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
gcmap <- gess_gcmap(qsig_gcmap, higher = 1, lower = -1)
result(gcmap)
## # A tibble: 45,956 × 10
## pert PCID cell type trend effect nSet nFound signed t_gn_sym
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <lgl> <chr>
## 1 capsazepine 2733484 HCC515 trt_cp up 1 777 2 TRUE TRPV1; …
## 2 BRD-K95547711 5310786 A549 trt_cp down -1 936 2 TRUE <NA>
## 3 hydrocotarnine 3646 HT29 trt_cp down -0.994 988 2 TRUE <NA>
## 4 ZM-39923 3797 PC3 trt_cp down -0.994 458 2 TRUE <NA>
## 5 BRD-K65050353 2857183 NEU trt_cp down -0.994 1547 2 TRUE <NA>
## 6 RU-24969 108029 ASC trt_cp up 0.994 1278 2 TRUE HTR1B; …
## 7 BRD-A31429689 2998821 PC3 trt_cp up 0.994 525 2 TRUE <NA>
## 8 gingerol 442793 NEU trt_cp down -0.994 590 2 TRUE <NA>
## 9 austricine 6419904 HA1E trt_cp up 0.988 1791 3 TRUE <NA>
## 10 BRD-K58628167 3107687 VCAP trt_cp up 0.988 446 2 TRUE <NA>
## # … with 45,946 more rows
The columns in the corresponding search result table, that are specific to the gCMAP method, contain the following information. effect: scaled bi-directional enrichment score corresponding to the scaled_score under the CMAP result; nSet: number of genes in the reference gene sets after applying the higher and lower cutoff; nFound: number of genes in the reference gene sets that are present in the query signature; signed: whether the gene sets in the reference database have signs, e.g. representing up and down regulated genes when computing scores.
Visualize GESS Results
gcmap_res <- result(gcmap)
write.csv2(gcmap_res, file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_gcmap_res_human.csv")
drugs_top20_gcmap <- c(unique(gcmap_res$pert)[1:20])
gess_res_vis(gcmap_res, drugs = drugs_top20_gcmap, col = "effect")
gcmap_res_down <- gcmap_res[gcmap_res$trend == "down",]
down_drug_gcmap <- unique(gcmap_res_down$pert)[1:20]
gcmap_res_up <- gcmap_res[gcmap_res$trend == "up",]
up_drug_gcmap <- unique(gcmap_res_up$pert)[1:20]
drugs_top20 <- c(unique(gcmap_res_down$pert)[1:20], "temozolomide")
gess_res_vis(gcmap_res, drugs = drugs_top20_gcmap, col = "effect")
TSEA for gCMAP
dup_rct_res_gcmap_down <- tsea_dup_hyperG(drugs = drugs_top20_gcmap[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 13 drugs:
## brd-k95547711 / hydrocotarnine / zm-39923 / brd-k65050353 / brd-a31429689 / gingerol / austricine / brd-k58628167 / brd-k85174275 / brd-k59036917 / brd-k42867405 / brd-k02641134 / mw-stk33-3a
write.csv2(result(dup_rct_res_gcmap_down), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_gcmap_tsea_down_human.csv")
result(dup_rct_res_gcmap_down)
## # A tibble: 27 × 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa04… Neuroactiv… 11/15 341/80… 7.52e-13 2.03e-11 1.27e-11 7442/3… 11
## 2 hsa04… Serotonerg… 5/15 115/80… 1.43e- 6 1.93e- 5 1.20e- 5 3351/3… 5
## 3 hsa04… Calcium si… 6/15 240/80… 2.57e- 6 2.31e- 5 1.44e- 5 3357/3… 6
## 4 hsa04… cAMP signa… 4/15 219/80… 5.65e- 4 3.81e- 3 2.38e- 3 3351/3… 4
## 5 hsa04… Inflammato… 3/15 98/8086 7.07e- 4 3.82e- 3 2.38e- 3 7442/5… 3
## 6 hsa00… Arachidoni… 2/15 61/8086 5.52e- 3 2.48e- 2 1.55e- 2 6916/5… 2
## 7 hsa04… Gastric ac… 2/15 76/8086 8.46e- 3 3.26e- 2 2.03e- 2 887/25… 2
## 8 hsa04… Taste tran… 2/15 86/8086 1.07e- 2 3.36e- 2 2.10e- 2 3351/3… 2
## 9 hsa04… Gap juncti… 2/15 88/8086 1.12e- 2 3.36e- 2 2.10e- 2 3357/1… 2
## 10 hsa04… Pancreatic… 2/15 102/80… 1.49e- 2 4.01e- 2 2.50e- 2 5319/8… 2
## # … with 17 more rows
drugs_top20_gcmap_up <- c(unique(gcmap_res_up$pert)[1:20])
gess_res_vis(gcmap_res, drugs = drugs_top20_gcmap_up, col = "effect")
DSEA for gCMAP
dup_rct_res_gcmap_up <- tsea_dup_hyperG(drugs = drugs_top20_gcmap_up[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 13 drugs:
## brd-a31429689 / austricine / brd-k58628167 / brd-k85174275 / brd-k59036917 / brd-k62719675 / cortisone-acetate / brd-k76357752 / brd-k00551481 / brd-k33452357 / brd-k11778372 / brd-k26359746 / brd-k66703091
write.csv2(result(dup_rct_res_gcmap_up), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_gcmap_tsea_up_human.csv")
result(dup_rct_res_gcmap_up)
## # A tibble: 56 × 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa04… Adrenergic… 15/38 150/80… 5.60e-17 3.36e-15 1.36e-15 148/14… 15
## 2 hsa05… Arrhythmog… 12/38 77/8086 5.07e-16 1.52e-14 6.13e-15 775/77… 12
## 3 hsa04… Cardiac mu… 12/38 87/8086 2.37e-15 3.63e-14 1.47e-14 775/77… 12
## 4 hsa04… Oxytocin s… 14/38 154/80… 2.97e-15 3.63e-14 1.47e-14 775/77… 14
## 5 hsa04… MAPK signa… 17/38 294/80… 3.10e-15 3.63e-14 1.47e-14 773/77… 17
## 6 hsa05… Hypertroph… 12/38 90/8086 3.63e-15 3.63e-14 1.47e-14 775/77… 12
## 7 hsa05… Dilated ca… 12/38 96/8086 8.13e-15 6.97e-14 2.81e-14 775/77… 12
## 8 hsa04… Calcium si… 15/38 240/80… 6.67e-14 5.00e-13 2.02e-13 3357/3… 15
## 9 hsa04… Serotonerg… 11/38 115/80… 2.58e-12 1.72e-11 6.95e-12 3351/3… 11
## 10 hsa04… GnRH secre… 8/38 64/8086 3.98e-10 2.39e- 9 9.64e-10 775/77… 8
## # … with 46 more rows
DSEA
hyperG_k_res_gcmap_up <- dsea_hyperG(drugs = drugs_top20_gcmap_up[1:20], type = "KEGG",
pvalueCutoff = 0.05 , qvalueCutoff = 0.05,
minGSSize = 2, maxGSSize = 2000)
result(hyperG_k_res_gcmap_up)
## # A tibble: 0 × 9
## # … with 9 variables: ID <chr>, Description <chr>, GeneRatio <chr>,
## # BgRatio <chr>, pvalue <dbl>, p.adjust <dbl>, qvalue <dbl>, itemID <chr>,
## # Count <int>
#0 results
dtnetplot(drugs = drugs(hyperG_k_res_gcmap_up), set = "hsa04921",
desc="Oxytocin signaling pathway")
## No targets found in DrugBank/CLUE/STITCH database for 13 drugs:
## brd-a31429689 / austricine / brd-k58628167 / brd-k85174275 / brd-k59036917 / brd-k62719675 / cortisone-acetate / brd-k76357752 / brd-k00551481 / brd-k33452357 / brd-k11778372 / brd-k26359746 / brd-k66703091
Fisher’s exact test (Graham J. G. Upton 1992) can also be used to search a GS-DB (gene set database) for entries that are similar to a GS-Q (gene set query). In this case both the query and the database are composed of gene label sets, such as DEG sets.
qsig_fisher <- qSig(query = as.matrix(LOG_V5), gess_method = "Fisher", refdb = "lincs")
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
fisher <- gess_fisher(qSig = qsig_fisher, higher = 1, lower = -1)
result(fisher)[1:50,]
## # A tibble: 50 × 13
## pert PCID cell type trend pval padj effect LOR nSet nFound signed
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
## 1 SA-1… 5465… NKDBA trt_… over 2.54e-6 0.0127 4.71 1.62 1806 18 FALSE
## 2 BRD-… 5664… PC3 trt_… over 3.01e-6 0.0150 4.67 1.54 2475 21 FALSE
## 3 cycl… 1640… PHH trt_… over 7.14e-6 0.0357 4.49 1.50 4468 28 FALSE
## 4 QL-X… 7370… HS57… trt_… over 1.91e-5 0.0478 4.27 1.44 4657 28 FALSE
## 5 K784… 4381… VCAP trt_… under 6.59e-5 0.0843 -3.99 -1.32 9826 20 FALSE
## 6 BRD-… 5333… PC3 trt_… over 6.74e-5 0.0843 3.99 1.71 732 10 FALSE
## 7 BRD-… 7816… MCF7 trt_… over 5.28e-5 0.0843 4.04 1.34 3640 24 FALSE
## 8 BRD-… 7632… MCF7 trt_… over 1.93e-5 0.0963 4.27 1.41 2530 20 FALSE
## 9 BRD-… 5664… PC3 trt_… over 1.14e-4 0.114 3.86 2.21 247 6 FALSE
## 10 HDAC… NotF… NEU trt_… over 4.40e-4 0.122 3.52 1.26 1701 14 FALSE
## # … with 40 more rows, and 1 more variable: t_gn_sym <chr>
The columns in the result table specific to the Fisher method include the following information. pval: p-value of the Fisher’s exact test; padj: p-value adjusted for multiple hypothesis testing using R’s p.adjust function with the Benjamini & Hochberg (BH) method; effect: z-score based on the standard normal distribution; LOR: log odds ratio.
Visualize GESS Results
write.csv2(result(fisher), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_fisher_res_human.csv")
fisher_res <- result(fisher)
drugs_top20_fisher <- c(unique(fisher_res$pert)[1:20])
gess_res_vis(fisher_res, drugs = drugs_top20_fisher, col = "effect")
fisher_res_over <- fisher_res[fisher_res$trend=="over", ]
up_drug_fisher <- unique(fisher_res_over$pert)[1:20]
fisher_res_under <- fisher_res[fisher_res$trend=="under", ]
down_drug_fisher <- unique(fisher_res_under$pert)[1:20]
drugs_top20_fisher_over <- c(unique(fisher_res_over$pert)[1:20])
gess_res_vis(fisher_res, drugs = drugs_top20_fisher_over, col = "effect")
Fisher TSEA
dup_rct_res_fisher_over <- tsea_dup_hyperG(drugs = drugs_top20_fisher_over[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 13 drugs:
## sa-1472623 / brd-k23980805 / brd-k30836161 / brd-k53932786 / brd-k50659736 / brd-k87371039 / hdac1-selective / isonicotinohydroxamic-acid / brd-k89097220 / cephalosporanic-acid / brd-k26767475 / brd-k86948282 / brd-k96704748
write.csv2(result(dup_rct_res_fisher_over), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_fisher_tsea_over_human.csv")
result(dup_rct_res_fisher_over)
## # A tibble: 97 × 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa04… Serotonerg… 17/57 115/80… 1.03e-18 1.59e-16 1.07e-16 773/77… 17
## 2 hsa04… Calcium si… 15/57 240/80… 5.75e-11 4.45e- 9 2.99e- 9 10105/… 15
## 3 hsa04… MAPK signa… 16/57 294/80… 9.37e-11 4.84e- 9 3.25e- 9 5530/5… 16
## 4 hsa04… Adrenergic… 12/57 150/80… 3.68e-10 1.43e- 8 9.60e- 9 147/14… 12
## 5 hsa04… Oxytocin s… 11/57 154/80… 7.26e- 9 2.25e- 7 1.51e- 7 5530/5… 11
## 6 hsa05… Chemical c… 8/57 69/8086 2.20e- 8 5.67e- 7 3.81e- 7 1577/1… 8
## 7 hsa04… Type II di… 7/57 46/8086 2.56e- 8 5.67e- 7 3.81e- 7 2475/7… 7
## 8 hsa05… Arrhythmog… 8/57 77/8086 5.30e- 8 1.03e- 6 6.90e- 7 775/77… 8
## 9 hsa04… Taste tran… 8/57 86/8086 1.27e- 7 2.16e- 6 1.45e- 6 773/77… 8
## 10 hsa04… Cardiac mu… 8/57 87/8086 1.39e- 7 2.16e- 6 1.45e- 6 775/77… 8
## # … with 87 more rows
Fisher DSEA
hyperG_k_res_fisher_over <- dsea_hyperG(drugs = drugs_top20_fisher_over[1:20], type = "KEGG",
pvalueCutoff = 0.05 , qvalueCutoff = 0.05,
minGSSize = 2, maxGSSize = 2000)
result(hyperG_k_res_fisher_over)
## # A tibble: 20 × 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa05014 Amyotrophic… 4/7 1174/20… 3.28e-4 0.0186 0.0120 cyclo… 4
## 2 hsa05016 Huntington … 4/7 1185/20… 3.40e-4 0.0186 0.0120 cyclo… 4
## 3 hsa05012 Parkinson d… 4/7 1222/20… 3.83e-4 0.0186 0.0120 cyclo… 4
## 4 hsa05020 Prion disea… 4/7 1382/20… 6.14e-4 0.0224 0.0144 cyclo… 4
## 5 hsa00830 Retinol met… 3/7 732/204… 1.43e-3 0.0307 0.0197 cyclo… 3
## 6 hsa05204 Chemical ca… 3/7 732/204… 1.43e-3 0.0307 0.0197 cyclo… 3
## 7 hsa04720 Long-term p… 3/7 777/204… 1.70e-3 0.0307 0.0197 cyclo… 3
## 8 hsa02010 ABC transpo… 2/7 199/204… 1.91e-3 0.0307 0.0197 cyclo… 2
## 9 hsa00982 Drug metabo… 3/7 853/204… 2.23e-3 0.0307 0.0197 cyclo… 3
## 10 hsa00980 Metabolism … 3/7 869/204… 2.35e-3 0.0307 0.0197 cyclo… 3
## 11 hsa00140 Steroid hor… 3/7 906/204… 2.65e-3 0.0307 0.0197 cyclo… 3
## 12 hsa05017 Spinocerebe… 3/7 906/204… 2.65e-3 0.0307 0.0197 cyclo… 3
## 13 hsa05031 Amphetamine… 3/7 916/204… 2.73e-3 0.0307 0.0197 cyclo… 3
## 14 hsa05168 Herpes simp… 3/7 1122/20… 4.87e-3 0.0493 0.0317 ql-x-… 3
## 15 hsa04921 Oxytocin si… 3/7 1148/20… 5.20e-3 0.0493 0.0317 cyclo… 3
## 16 hsa04721 Synaptic ve… 2/7 357/204… 6.01e-3 0.0493 0.0317 verap… 2
## 17 hsa04935 Growth horm… 3/7 1217/20… 6.13e-3 0.0493 0.0317 ql-x-… 3
## 18 hsa04218 Cellular se… 3/7 1248/20… 6.58e-3 0.0493 0.0317 cyclo… 3
## 19 hsa01522 Endocrine r… 3/7 1259/20… 6.74e-3 0.0493 0.0317 cyclo… 3
## 20 hsa04932 Non-alcohol… 3/7 1260/20… 6.76e-3 0.0493 0.0317 verap… 3
drugs_top20_fisher_under <- c(unique(fisher_res_under$pert)[1:20])
gess_res_vis(fisher_res, drugs = drugs_top20_fisher_under, col = "effect")
dup_rct_res_fisher_under <- tsea_dup_hyperG(drugs = drugs_top20_fisher_under[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 15 drugs:
## k784-3131 / oxetane / brd-k32111336 / brd-k65134192 / brd-k04722205 / bg-1021 / brd-k53932786 / ran-14b / brd-k66448556 / brd-k74606027 / gnf-2 / brd-k10806285 / ku-c103655 / brd-k41284916 / brd-k96194181
result(dup_rct_res_fisher_under)
## # A tibble: 183 × 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa04… Dopaminerg… 11/30 132/80… 6.05e-13 1.13e-10 4.46e-11 7054/1… 11
## 2 hsa04… Fc epsilon… 7/30 68/8086 3.79e- 9 2.91e- 7 1.14e- 7 207/24… 7
## 3 hsa04… Prolactin … 7/30 70/8086 4.66e- 9 2.91e- 7 1.14e- 7 7054/2… 7
## 4 hsa05… Yersinia i… 8/30 137/80… 2.37e- 8 1.11e- 6 4.36e- 7 207/29… 8
## 5 hsa00… Folate bio… 5/30 26/8086 3.09e- 8 1.15e- 6 4.55e- 7 5053/5… 5
## 6 hsa04… T cell rec… 7/30 104/80… 7.59e- 8 2.36e- 6 9.31e- 7 207/29… 7
## 7 hsa05… Toxoplasmo… 7/30 112/80… 1.27e- 7 3.39e- 6 1.33e- 6 4843/2… 7
## 8 hsa04… Sphingolip… 7/30 119/80… 1.93e- 7 4.00e- 6 1.58e- 6 207/56… 7
## 9 hsa04… Growth hor… 7/30 119/80… 1.93e- 7 4.00e- 6 1.58e- 6 207/29… 7
## 10 hsa04… Osteoclast… 7/30 128/80… 3.18e- 7 5.70e- 6 2.24e- 6 5468/2… 7
## # … with 173 more rows
qsig_sp <- qSig(query = as.matrix(LOG_V5), gess_method = "Cor", refdb = "lincs")
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
sp <- gess_cor(qSig=qsig_sp, method="spearman")
result(sp)[1:50,]
## # A tibble: 50 × 7
## pert PCID cell type trend cor_score t_gn_sym
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 galantamine 9651 HT29 trt_cp down -0.618 ACHE; BCHE; CHRNA1; CH…
## 2 STK-064652 2878746 PC3 trt_cp up 0.573 <NA>
## 3 BRD-K42743267 54638412 NPC trt_cp down -0.552 <NA>
## 4 BRD-K56450018 54666500 PC3 trt_cp down -0.546 <NA>
## 5 BRD-K91063002 44501482 A375 trt_cp down -0.540 <NA>
## 6 BRD-K71339761 107410 FIBRNPC trt_cp down -0.533 <NA>
## 7 salbutamol 2083 VCAP trt_cp up 0.530 ADCY10; ADRB1; ADRB2; …
## 8 BRD-K48598367 54650132 NPC trt_cp up 0.529 <NA>
## 9 BRD-K90070572 49835840 VCAP trt_cp up 0.526 <NA>
## 10 BRD-K93623754 6404603 A375 trt_cp down -0.523 <NA>
## # … with 40 more rows
Visualize GESS Results
sp_res <- result(sp)
write.csv2(result(sp), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_spearman_res_human.csv")
drugs_top20_spearman <- c(unique(sp_res$pert)[1:20])
gess_res_vis(sp_res, drugs = drugs_top20_spearman, col = "cor_score")
sp_res_up <- sp_res[sp_res$trend == "up",]
up_drug_sp <- unique(sp_res_up$pert)[1:20]
sp_res_down <- sp_res[sp_res$trend == "down",]
down_drug_sp <- unique(sp_res_down$pert)[1:20]
drugs_top20_sp_down <- c(unique(sp_res_down$pert)[1:20])
gess_res_vis(sp_res, drugs = drugs_top20_sp_down, col = "cor_score")
Spearman TSEA
dup_rct_res_sp_down <- tsea_dup_hyperG(drugs = drugs_top20_sp_down[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 14 drugs:
## brd-k42743267 / brd-k56450018 / brd-k91063002 / brd-k71339761 / brd-k93623754 / brd-k67306351 / brd-k81408913 / brd-k99874282 / brd-k59488055 / cercosporin / brd-k52099002 / brd-k28162668 / brd-k97522823 / brd-k84024786
result(dup_rct_res_sp_down)
## # A tibble: 52 × 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa04… Neuroactiv… 29/58 341/80… 4.10e-25 4.96e-23 3.97e-23 1134/5… 29
## 2 hsa05… Nicotine a… 9/58 40/8086 6.06e-12 3.67e-10 2.93e-10 1137/8… 9
## 3 hsa04… Insulin se… 11/58 86/8086 1.55e-11 4.83e-10 3.86e-10 3778/3… 11
## 4 hsa04… Cholinergi… 12/58 113/80… 1.60e-11 4.83e-10 3.86e-10 43/113… 12
## 5 hsa05… Chemical c… 12/58 212/80… 2.39e- 8 5.77e- 7 4.62e- 7 1136/1… 12
## 6 hsa04… cGMP-PKG s… 9/58 167/80… 2.46e- 6 4.97e- 5 3.98e- 5 4985/3… 9
## 7 hsa04… Leukocyte … 7/58 114/80… 1.53e- 5 2.53e- 4 2.02e- 4 1535/1… 7
## 8 hsa05… Leishmania… 6/58 77/8086 1.67e- 5 2.53e- 4 2.02e- 4 1535/1… 6
## 9 hsa05… Diabetic c… 8/58 203/80… 8.97e- 5 1.21e- 3 9.65e- 4 1535/1… 8
## 10 hsa05… Prion dise… 9/58 273/80… 1.25e- 4 1.51e- 3 1.21e- 3 1535/1… 9
## # … with 42 more rows
Spearman DSEA
hyperG_k_res_sp_down<- dsea_hyperG(drugs = drugs_top20_sp_down[1:20], type = "KEGG",
pvalueCutoff = 0.05 , qvalueCutoff = 0.05,
minGSSize = 2, maxGSSize = 2000)
result(hyperG_k_res_sp_down)
## # A tibble: 4 × 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa04972 Pancreatic … 4/6 857/20… 4.28e-5 0.00480 0.00437 dextro… 4
## 2 hsa04022 cGMP-PKG si… 4/6 1509/2… 3.92e-4 0.0171 0.0156 dextro… 4
## 3 hsa04911 Insulin sec… 3/6 595/20… 4.58e-4 0.0171 0.0156 micona… 3
## 4 hsa04970 Salivary se… 3/6 884/20… 1.46e-3 0.0408 0.0372 micona… 3
drugs_top20_sp_up <- c(unique(sp_res_up$pert)[1:20])
gess_res_vis(sp_res, drugs = drugs_top20_sp_up, col = "cor_score")
dup_rct_res_sp_up <- tsea_dup_hyperG(drugs = drugs_top20[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 10 drugs:
## brd-k95547711 / hydrocotarnine / zm-39923 / brd-k65050353 / gingerol / brd-k42867405 / brd-k02641134 / mw-stk33-3a / brd-k64106162 / brd-k12393722
write.csv2(result(dup_rct_res_sp_up), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_spearman_res_human.csv")
spearman_tsea_up <- result(dup_rct_res_sp_up)
spearman_tsea_up
## # A tibble: 197 × 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa05… PD-L1 expr… 12/50 89/8086 1.27e-13 2.60e-11 8.71e-12 7535/3… 12
## 2 hsa05… Yersinia i… 11/50 137/80… 4.69e-10 4.78e- 8 1.60e- 8 2185/7… 11
## 3 hsa05… Hepatitis B 11/50 162/80… 2.83e- 9 1.92e- 7 6.45e- 8 1017/2… 11
## 4 hsa04… Prolactin … 8/50 70/8086 8.42e- 9 4.29e- 7 1.44e- 7 3717/2… 8
## 5 hsa04… T cell rec… 9/50 104/80… 1.10e- 8 4.48e- 7 1.50e- 7 7535/2… 9
## 6 hsa04… Growth hor… 9/50 119/80… 3.61e- 8 1.23e- 6 4.11e- 7 3717/2… 9
## 7 hsa05… Human immu… 11/50 212/80… 4.73e- 8 1.38e- 6 4.62e- 7 983/21… 11
## 8 hsa05… Lipid and … 11/50 215/80… 5.47e- 8 1.39e- 6 4.68e- 7 1559/3… 11
## 9 hsa04… Osteoclast… 9/50 128/80… 6.82e- 8 1.53e- 6 5.13e- 7 7297/2… 9
## 10 hsa04… Th1 and Th… 8/50 92/8086 7.49e- 8 1.53e- 6 5.13e- 7 7535/3… 8
## # … with 187 more rows
hyperG_k_res_sp_up <- dsea_hyperG(drugs = drugs_top20_sp_up[1:20], type = "KEGG",
pvalueCutoff = 0.05 , qvalueCutoff = 0.05,
minGSSize = 2, maxGSSize = 2000)
result(hyperG_k_res_sp_up)
## # A tibble: 0 × 9
## # … with 9 variables: ID <chr>, Description <chr>, GeneRatio <chr>,
## # BgRatio <chr>, pvalue <dbl>, p.adjust <dbl>, qvalue <dbl>, itemID <chr>,
## # Count <int>
Looking for Drug Results in Two Methods
intersect(up_drug_cmap, up_drug_lincs)
## [1] "MRS-1845"
intersect(up_drug_cmap, up_drug_gcmap)
## character(0)
intersect(up_drug_cmap, up_drug_sp)
## character(0)
intersect(up_drug_cmap, up_drug_fisher)
## character(0)
intersect(up_drug_lincs, up_drug_gcmap)
## character(0)
intersect(up_drug_lincs, up_drug_sp)
## character(0)
intersect(up_drug_lincs, up_drug_fisher)
## character(0)
intersect(up_drug_gcmap, up_drug_sp)
## character(0)
intersect(up_drug_gcmap, up_drug_fisher)
## character(0)
intersect(up_drug_sp, up_drug_fisher)
## [1] "HDAC1-selective"
table_list = list("CMAP" = result(tsea_kegg_cmap_human), "LINCS" = result(tsea_lincs_up), "gCMAP" = result(dup_rct_res_gcmap_up), "Fisher" = result(dup_rct_res_fisher_over), "Spearman" = result(dup_rct_res_sp_up))
comp_fea_res(table_list, rank_stat = "pvalue", Nshow = 20)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble 3.1.3 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.0 ✓ forcats 0.5.1
## ✓ purrr 0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::collapse() masks IRanges::collapse()
## x dplyr::combine() masks Biobase::combine(), BiocGenerics::combine()
## x dplyr::count() masks matrixStats::count()
## x dplyr::desc() masks IRanges::desc()
## x tidyr::expand() masks S4Vectors::expand()
## x dplyr::filter() masks ensembldb::filter(), stats::filter()
## x dplyr::first() masks S4Vectors::first()
## x dplyr::ident() masks dbplyr::ident()
## x dplyr::lag() masks stats::lag()
## x ggplot2::Position() masks BiocGenerics::Position(), base::Position()
## x purrr::reduce() masks GenomicRanges::reduce(), IRanges::reduce()
## x dplyr::rename() masks S4Vectors::rename()
## x dplyr::select() masks ensembldb::select(), AnnotationDbi::select()
## x dplyr::slice() masks IRanges::slice()
## x dplyr::sql() masks dbplyr::sql()
#code to get the lincs data and plot that I've been using
lincs_human <- gess_lincs(qsig_lincs_human, sortby="NCS", tau=TRUE)
result(lincs_human)[1:50,]
## # A tibble: 50 × 15
## pert PCID cell type trend WTCS WTCS_Pval WTCS_FDR NCS Tau
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 BRD-K863366… 546541… PHH trt_… down -0.617 0.0000167 9.53e-5 -1.77 -99.9
## 2 BRD-K904461… 445011… PC3 trt_… up 0.641 0.0000156 9.53e-5 1.75 99.4
## 3 PD-173074 1401 PHH trt_… up 0.619 0.0000166 9.53e-5 1.73 99.3
## 4 garcinol 737075… NPC trt_… down -0.609 0.0000170 9.53e-5 -1.73 -99.5
## 5 GW-3965 447905 SKB trt_… up 0.636 0.0000158 9.53e-5 1.73 99.4
## 6 BRD-A391216… 2793096 MCF7 trt_… down -0.627 0.0000162 9.53e-5 -1.71 -99.6
## 7 ASN-05257430 650361 A549 trt_… down -0.608 0.0000171 9.53e-5 -1.71 -99.9
## 8 thiethylper… 5440 MCF7 trt_… down -0.616 0.0000167 9.53e-5 -1.68 -100.
## 9 geranylgera… 5281365 MCF7 trt_… down -0.616 0.0000167 9.53e-5 -1.68 -99.7
## 10 BRD-A741349… 3950572 HA1E trt_… down -0.632 0.0000160 9.53e-5 -1.68 -99.8
## # … with 40 more rows, and 5 more variables: TauRefSize <int>, NCSct <dbl>,
## # N_upset <int>, N_downset <int>, t_gn_sym <chr>
lincs_res <- result(lincs_human)
drugs_lincs_top <- unique(lincs_res$pert)[1:20]
head(drugs_lincs_top)
## [1] "BRD-K86336638" "BRD-K90446118" "PD-173074" "garcinol"
## [5] "GW-3965" "BRD-A39121618"
#remove BRD pertubagens
non_BRD_perts <- str_remove_all(drugs_lincs_top, "BRD-")
lincs_prelim_plot <- signatureSearch::gess_res_vis(lincs_res, drugs = non_BRD_perts, col = "NCS") +
labs(x = "Drug Perturbation", y = "Normalized Connectivity Score") +
labs(title = "Exercise Signature Comparison to LINCS") +
scale_shape_discrete(labels = c("Normal Cell Line", "Cancer Cell Line")) +
theme(panel.grid = element_blank(), axis.text.x = element_text(size = 14), axis.title = element_text(size = 16), title = element_text(size = 18), legend.text = element_text(size = 16)) +
theme(plot.margin = unit(c(1,1,1,.5),"cm"))
lincs_prelim_plot
ggsave(filename = "210706_lincs_pert_plot.pdf", lincs_prelim_plot)
## Saving 7 x 5 in image
table_list = list("CMAP" = result(tsea_kegg_cmap_human), "LINCS" = result(tsea_lincs_up), "gCMAP" = result(dup_rct_res_gcmap_up), "Fisher" = result(dup_rct_res_fisher_over), "Spearman" = result(dup_rct_res_sp_up))
top_pathways <- comp_fea_res(table_list, rank_stat = "pvalue", Nshow = 20) +
labs(x = "Gene Expression Signature \nSearch Method", y = "KEGG Pathways") +
labs(title = "Top Exercise \nSignature Pathways") +
theme(panel.grid = element_blank()) +
theme(panel.grid = element_blank(), axis.text.x = element_text(size = 16), axis.title = element_text(size = 16), title = element_text(size = 18), legend.text = element_text(size = 16)) +
theme(plot.margin = unit(c(.75,.1,.75,.5),"cm")) #top, right, bottom, left
top_pathways
#theme(panel.grid = element_blank(), axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 16), axis.title = element_text(size = 16), title = element_text(size = 18), legend.text = element_text(size = 16)) +
# theme(plot.margin = unit(c(1,1,1.5,1.2),"cm")) + #top, right, bottom, left
ggsave(filename = "210706_SignatureSearch_TopPathways.pdf", top_pathways)
## Saving 7 x 5 in image
Location of final scripts: Locally /Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/
Operations complete: 210604
Versions
installed.packages()[names(sessionInfo()$otherPkgs), "Version"]
## forcats stringr dplyr
## "0.5.1" "1.4.0" "1.0.7"
## purrr readr tidyr
## "0.3.4" "2.0.0" "1.1.3"
## tibble tidyverse org.Hs.eg.db
## "3.1.3" "1.3.1" "3.12.0"
## signatureSearchData EnsDb.Hsapiens.v75 ensembldb
## "1.4.0" "2.99.0" "2.14.1"
## AnnotationFilter GenomicFeatures AnnotationDbi
## "1.14.0" "1.42.3" "1.52.0"
## rhdf5 ExperimentHub AnnotationHub
## "2.34.0" "1.16.1" "2.22.1"
## BiocFileCache dbplyr ggplot2
## "1.14.0" "2.1.1" "3.3.5"
## signatureSearch SummarizedExperiment Biobase
## "1.7.2" "1.20.0" "2.50.0"
## GenomicRanges GenomeInfoDb IRanges
## "1.42.0" "1.26.7" "2.24.1"
## S4Vectors BiocGenerics MatrixGenerics
## "0.28.1" "0.36.1" "1.2.1"
## matrixStats Rcpp
## "0.60.0" "1.0.7"